Morphological Disambiguation by Voting Constraints
ثبت نشده
چکیده
We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the rule developer from worrying about potentially conflicting rule sequencing. Our results for disambiguating Turkish indicate that using about 500 constraint rules and some additional simple statistics, we can attain a recall of 95-96~ and a precision of 94-95~ with about 1.01 parses per token. Our system is implemented in Prolog and we are currently investigating an efficient implementation based on finite state transducers. 1 I n t r o d u c t i o n Automatic morphological disambiguation is an important component in higher level analysis of natural language text corpora. There has been a large number of studies in tagging and morphological disambiguation using various techniques such as statistical techniques, e.g., (Church, 1988; Cutting et al., 1992; DeRose, 1988), constraint-based techniques (Karlsson et al., 1995; Voutilainen, 1995b; Voutilainen, Heikkil/i, and Anttila, 1992; Voutilainen and Tapanainen, 1993; Oflazer and KuruSz, 1994; Oflazer and T i l l 1996) and transformation-based techniques (Brilt, 1992; Brill, 1994; Brill, 1995). This paper presents a novel approach to constraint based morphological disambiguation which relieves the rule developer from worrying about conflicting rule ordering requirements. The approach depends on assigning votes to constraints according to their complexity and specificity, and then letting constraints cast votes on matching parses of a given lexical item. This approach does not reflect the outcome of matching constraints to the set of morphological parses immediately. Only after all applicable rules are applied to a sentence, all tokens are disambiguated in parallel. Thus, the outcome of the rule applications is independent of the order of rule applications. Rule ordering issue has been discussed by Voutilainen(1994), but he has recently indicated 1 that insensitivity to rule ordering is not a property of their system (although Voutilainen(1995a) states that it is a very desirable property) but rather is achieved by extensively testing and tuning the rules. In the following sections, we present an overview of the morphological disambiguation problem, highlighted with examples from Turkish. We then present our approach and results. We finally conclude with a very brief outline of our investigation into efficient implementations of our approach. 2 M o r p h o l o g i c a l D i s a m b i g u a t i o n In all languages, words are usually ambiguous in their parts-of-speech or other morphological features, and may represent lexical items of different syntactic categories, or morphological structures depending on the syntactic and semantic context. In languages like English, there are a very small number of possible word forms that can be generated from a given root word, and a small number of part-ofspeech tags associated with a given lexical form. On the other hand, in languages like Turkish or Finnish with very productive agglutinative morphology, it is possible to produce thousands of forms (or even millions (Hankamer, 1989)) from a given root word and the kinds of ambiguities one observes are quite different than what is observed in languages like English. In Turkish, there are ambiguities of the sort typically found in languages like English (e.g., book/noun vs book/verb type). However, the agglutinative nature of the language usually helps resolution of such ambiguities due to the restrictions on morphotactics of subsequent morphemes. On the 1Voutilainen, Private communication.
منابع مشابه
Morphological Disambiguation by Voting Constraints
We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the...
متن کاملImplementing Voting Constraints With Finite State Transducers
We describe a constraint-based morphological disambiguation system in which individual constraint rules vote on matching morphological parses followed by its implementation using finite state transducers. Voting constraint rules have a number of desirable properties: The outcome of the disambiguation is independent of the order of application of the local contextual constraint rules. Thus the r...
متن کاملDisambiguation of Standardized Personal Name Variants
A growing body of research addresses name disambiguation as part of coreference and entity resolution systems, but the systems do not robustly resolve the ambiguity introduced by standardized personal name variants, or nicknames. In many languages, personal name variants are governed by morphological and phonological constraints, providing a dataset rich in features which may be used to train a...
متن کاملTagging English by Path Voting Constraints
We describe a constraint-based tagging approach where individual constraint rules vote on sequences of matching tokens and tags. Disambiguation of all tokens in a sentence is performed at the very end by selecting tags that appear on the path that receives the highest vote. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence r...
متن کاملMorphological Disambiguation of Hebrew: A Case Study in Classifier Combination
Morphological analysis and disambiguation are crucial stages in a variety of natural language processing applications, especially when languages with complex morphology are concerned. We present a system which disambiguates the output of a morphological analyzer for Hebrew. It consists of several simple classifiers and a module which combines them under linguistically motivated constraints. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002